Symmetry consideration in identifying network structures 
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The topological information of a network can be retrieved equivalently from its complement con- 
sisting of the same nodes but complementary edges. Hence the partition of a network into certain 
substructures based on given criteria should be the same as that of its complement based on the equiv- 
alent criteria if the topological information is considered exclusively. This symmetry of partitioning 
between a network and its complement is due to the equivalence of their topological information and 
hence should be respected regardless of the detailed characteristics of the substructures considered. 
In this work we suggest this symmetry consideration as a general guideline and propose a symmetric 
community detection scheme to show its implications. Our method has no resolution limit and can 
be used to detect hierarchical community structures at different levels. Our study also suggests that 
the community structure is unlikely a result of random fluctuations in large networks. 
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In the last decade complex networks have been exten- 
sively studied with the aim to reveal and understand their 
structures at various scales (H. Besides the general sta- 
tistical properties such as small-world and scale-free 
properties, the significance of some common struc- 
tural features at the mesoscopic level has also been re- 
alized. The mesoscopic structures having received inten- 
sive studies include communities [3] and similar groups 
S S 01 node sets whose components have similar 
connection patterns. These mesoscopic structures are of 
scientific interest because they may have a close relation 
to certain behavioral or functional units of the system 
, and meanwhile they provide an ideal basis for reduc- 
tion or coarse-graining of networks 0, 0| , which could be 
particularly useful in dealing with networks of huge size 
as often encountered nowadays. Furthermore, these sub- 
structures also have important implications for various 
dynamical processes over the networks 
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However, in spite of the efforts and fast progress made 
in this field, the detection of these substructures still re- 
mains challenging. (Here we restrict ourselves to net- 
works consisting of these mesoscopic structures exclu- 
sively, such that the problem of detection is equivalent 
to that of partitioning.) One conceptual difficulty is the 
ambiguity in the definition of these substructures [J] , and 
the question of what characterizations are essential to 
them has not been thoroughly understood yet. In this 
Letter we suggest a symmetry that should be taken into 
account in the definition of network structures. It does 
not address the details of individual substructures and 
their characterizations, but is a property of networks. It 
provides a consistency criterion with which the network 
structures can be specified more precisely. The detection 
of network structures can then be improved as a result. 

This symmetry originates from the dual nature of con- 
nection states in networks. Consider a network of N 
nodes whose connection topology is encoded in the adja- 



cency matrix A with Aij 1 if node i and j are connected 
and Aij — otherwise. Obviously, the topological infor- 
mation contained in A is completely equivalent to that 
in its complement A related to A via the one-to-one map 
Aij = 1 — Aij. (The network corresponding to A is re- 
ferred to as the complement of the network correspond- 
ing to A.) Due to this equivalence, it is natural to expect 
that any structure recognized in network A based on cer- 
tain characterizations should be recognized in network A 
based on the same or equivalent characterizations. By 
way of analogy, this is similar to recognizing a face in a 
photo; given its features it can be done in the negative 
film equivalently. Assuming this equivalence, it provides 
an approach to check if the characterizations used for 
defining a structure are consistent. Only those charac- 
terizations (and their equivalent) based on which we can 
recognize the same structures in a network and its com- 
plement are regarded as consistent and acceptable. We 
suggest this symmetry principle should be adopted as a 
necessary condition in defining the network structures. 

This symmetry consideration has not been adopted as 
a general guideline in most investigations. In a recent 
study [7] a symmetric definition of similar groups is pro- 
posed. It has been found that indeed the symmetric def- 
inition can overcome some difficulties encountered with 
the asymmetric definition 6'j. Moreover, the symmetric 
definition can be extended to the connection information 
weighted networks, resulting in a new perspective to see 
the role of weights in the problem Q . It is interesting to 
notice that a symmetric definition of the community in a 
general spectral detection algorithm has also been found 
to outperform the asymmetric definitions [l^ . 

To demonstrate the power of this symmetry guideline, 
in the following we apply it to the community detection 
problem by constructing a symmetric quality function of 
partition. An asymmetric version, which has been so far 
the most popular [3], is that suggested by Newman and 
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FIG. 1: (color online) The optimal community partitioning 
of the karate network [l^ corresponding to the maximum 
value of the symmetric quality function with given number of 
modules C = 2, 3 and 4 represented by the partition lines 62 
(C = 2), &2 and bs (C = 3), 62, bs and 64 (C = 4), respec- 
tively. The largest symmetric modularity corresponds to the 
partition given by 62 and 63 (see Fig. 2 (b)). 
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where L is the total number of links in the network, da 
is the total degree of nodes in module a and la is the 
number of internal links of module a. The summand 
represents how much the fraction of links inside a mod- 
ule is more than what is being expected in the null model 
of A, i.e. random networks sharing the same nodes and 
the same degree sequence. For convenience let us denote 
by M^Q the maximum of qng over all the possible parti- 
tions containing C modules; then the modularity, Mngj 
is defined as the maximum value of over all allowed 
C values, and the corresponding partition is regarded to 
be the optimal community partition of network A 13]. 

The concept of modularity is an important contribu- 
tion to the definition and detection of communities in 
networks 4]. The modularity maximization has itself 
been developed into a popular method and many algo- 
rithms have also been developed for this purpose. Some 
issues however remain to be addressed. One is that the 
modularity based methods have a resolution limit ~ ^/L 
preventing them from identifying communities smaller 
than this limit another is that modularity may at- 
tain fairly large values when being applied to partitioning 
random networks [l5| , making its meaning elusive 4] . 

We consider here instead a symmetric quality function 
of partition: 
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Here Na is the number of nodes in module a, d™ (d^^*) 
is the total degree of nodes in module a correspond- 
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FIG. 2: (color online) The analysis of the karate network [Tg]. 
The comparison of Q'~' and (Qj?) (a) and that of M*^ and 
(b) where {Q,^) and are evaluated over 10* random 
networks with the same degree sequence, (c) and (d) show the 
distribution of for C = 3 and C = 10 respectively. Solid 
curves are Gaussian with the same averages and deviations. 



ing to their connections to themselves (other modules). 
The summand reflects the difference between the average 
edges a node in a module and a node outside can have 
to connect to the nodes in that module. For the parti- 
tion TT^ that all iV nodes are assigned into a single module 
(C = 1), it can be naturally extended to q{A, tt^) = d/A^^ 
{d is the total degree of all nodes). Apparently, q thus 
defined is symmetric; i.e. q{A,ix'^) = —9(^,77'^). 

Of all the possible partitions that have C modules the 
one, denoted by tt'^, that generates the maximum q value 
is regarded to be the optimal community partition given 
C. This is in agreement with our expectations for a good 
community partition. As g(v4, tt'^) = — g(A, tt'^), it sug- 
gests that to find the optimal partition with C modules 
in A by maximizing q can be equivalently done by mini- 
mizing q in A. For this reason q is more consistent. We 
denote by Q'-^ = q{A,T:'^) for the sake of convenience. 

As an example Fig. 1 shows the optimal partition tt'-^ 
of the karate network [l^ with C = 2, 3 and 4. Indeed 
they are consistent with our intuition of communities. In 
all other networks we have investigated this is always the 
case. Numerically we employ an accurate and very effi- 
cient fusion algorithm (AdClust) [l3| with a slight mod- 
ification. Initially each node consists of a module. At 
each fusion step followed there are two operations. First, 
for each node one finds the target module which mov- 
ing the node into may generate a maximum positive in- 
crease of q value. If this is successful then the node is 
moved into that module. After all the nodes are consid- 
ered (one by one and with a random order) this process 
is repeated with a new random order until all nodes are 
stable. Next, for all possible module pairs one finds the 
one whose merger may lead to the maximum increase 
(or minimum decrease) of the q value and then combine 
them. These two operations are repeated until all the 
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modules evolve into one. During this process we can 
obtain a series of partitions of different numbers of mod- 
ules, and they are regarded to be good approximations 
of the optimal partition tt*^. Careful studies have shown 
that different node orders taken in the first operations 
may lead to different partition results. For this reason 
10^ ~ fO^ 'random realizations' are performed in our 
calculations and the largest q values and the correspond- 
ing partitions are chosen to be the final approximations 
of Q*^ and tt'^. We have also checked the results obtained 
in this way with the stimulated annealing algorithm [l8| 
and found that they cannot be improved any further. 

Next, let us find out, among all the partitions with 
different number of modules {7t'-^,C — 1,2,...}, which 
one could be the most relevant. For this purpose we 
consider the null model, i.e. random networks that share 
the same degree sequence with the network considered, 
and define the symmetric modularity for a given C as 



Q^' ~ {Q'r 

{Q?) 
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Here is the maximum q value for the optimal parti- 
tion of a random network of null model, and (Qp) is the 
corresponding average over all such networks. mea- 
sures how much more modular the communities found in 
the original network are as compared with those found 
in the corresponding random networks. If the communi- 
ties are seen as certain ordered structures, then M'~^ also 
reflects how orderly the communities found are as com- 
pared with their counterparts arising out of pure random 
fluctuations. The overall modularity is thus defined as 
M = uYax{M^ , C ~ 1,2, ...} and the corresponding par- 
tition is assumed to be the most relevant. 

Fig. 2 shows the analysis of the karate network as 
an example. There we have considered 10"' random net- 
works of the null model generated with the rewiring tech- 
nique [ll]. It can be seen in Fig. 2(c) and (d) that the 
distribution of is perfect Gaussian, and hence can 
be well characterized by its average {Q^} and deviation 
Sqc. Meanwhile (Q^) is a function of C (Fig. 2(a)); this 
is the reason why it is introduced as the denominator in 
the definition of M<^. The results of (Fig. 2(b)) 
suggest that the partition of three communities (C — 3; 
see Fig. 1 for the partition) is the most relevant. We 
have also studied the dolphin network [2^ and the most 
relevant partition (C = 2,M = 0.6258) is found to be 
exactly the same as the natural split observed (2lj. For 
another popular testing network of the American college 
football teams [2^ our method suggests the partition of 
10 communities (C = 10,M = 1.3345). 

The fact that the distribution of is Gaussian allows 
us to define another useful quantity 
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FIG. 3: (color online) The results of and (Q^) (a), M'^ 
and (b) for a network of 30 cliques on a circle. Each 
clique has 3 nodes and the neighboring cliques are connected 
with one link. The partition corresponding to the largest M*^ 
(C = 30) assigns each clique into a single module accurately. 



which gives how 'modular' a random network (of the null 
model) can be as a result of fluctuations. Obviously only 
the partitions of the original network whose ^ 
may suggest meaningful community structures (see Fig. 
2(b) for a comparison of M'~^ and in the karate net- 
work). This should be seen as a necessary condition for 
the communities defined with the symmetric modularity 
and it concludes our community detection scheme. 

Now let us discuss two useful properties of the symmet- 
ric modularity. First, the community detection method 



based on it has no resolution limit. As an example 14 1 
we consider a network of A/" cliques sited on a circle. Each 
clique contains 3 nodes - the smallest size for a meaning- 
ful module - and any two neighboring cliques are linked 
with one edge. Our scheme can identify all cliques (for 
N > 2) without any ambiguity (see Fig. 3 for M — iO 
as an example) . In this simulation (and also in those for 
Fig. 4 and Fig. 5) (Q^) and F^ are evaluated over 10^ 
random networks with the same degree sequence. As a 
comparison, the method with the asymmetric modularity 
M'^q suggests instead the partition of 10 communities 
each containing 3 neighboring cliques [Mmg = 49/60) 
due to its inherent resolution limit. {M^q = 97/120 and 
43/60 for C = 15 and C = 30 in this case.) 

This high resolution even makes our method applicable 
to the hierarchical community networks - a challenge for 
the quality function method due to the multiple scales in- 
volved. In Fig. 4 we present the partition results for the 
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model hierarchical network suggested in [llj: 256 nodes 
are divided into 16 compartments of equal size at the 
first level and every 4 of them make a bigger compart- 
ment at the second level. The internal degree of nodes 
at the first (second) level z^^^ (^^ina) and the degree for 
the links between the second level communities keep an 
average of Zi^i + zi-a^ + Zout = 18 (hence the hierarchical 
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FIG. 4: (color online) The symmetric modularity of the hier- 
archial network [ll| of type (z\ni — 2in2 ) 14 — 3 (a), 13 — 4 (b) 
and 15 — 2 (c). The relevant hierarchical scales C = 4 and 16 
can be related to the local maxima (also sharp turning points) 
on the curve. The change of M'~' values at the maxima 
from (a) to (c) reflects the competition of the two scales. 
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FIG. 5: (color online) The average of (see Eq. (g))) and the 
corresponding deviation 5pc (error bar) of the ER (a) and BA 
(b) networks evaluated over eight different degree sequences. 
The four sets of data from top to bottom correspond to A'' = 
40, 60, 90 and 135 in (a) and m = 5, 4, 3 and 2 in (b) with m 
the defining parameter of BA scale- free networks 



levels can be indicated by 'zim — Zinj')- We find that 
the hierarchical structures are well characterized by the 
local maxima (also the sharp turning points) on the M'~^ 
curve indicating the relevant scales (C = 4 and 16 in this 
case) and a higher level in between. However, with the 
asymmetric modularity method {M^q) no signal for the 
first level communities (C = 16) can be recognized. 

Second, the symmetric modularity does not take large 
value for a random network. Careful studies of Erdos- 
Renyi (ER) and Barabasi-Albertscale (BA) scale-free 
networks [3| are summarized in Fig. 5. For an ER net- 
work with N nodes and connection probability p stud- 
ied there, we have verified that M'-^ is around zero and 
\M'~'\ ^ as implied by definition. Meanwhile, the 
data suggest that may depend on the degree se- 
quence, but always takes the maximum value at C = 2. 
For this reason we have considered eight different degree 
sequences for each TV, p pair and calculated their average 
and the corresponding deviation (see Fig. 5(a)). It can 
be seen that {F^) is small and does not depend on p sig- 
nificantly; more important as N is increased it keeps de- 
creasing roughly in a power law ~ pf-0-'!5±Qm ^ This sug- 
gests that the community structure cannot be a general 
property in ER networks. In addition, the dependence 
of F^ on the degree sequence is very weak {5pc < 0.05), 
suggesting the chance for finding meaningful community 
structure in certain realizations of ER networks of par- 
ticular degree sequences is also very slim. The study of 
the scale-free networks leads to the same results except 
that the power law dependence of {F^) on the network 
size is roughly ~ ]\r-o.46±o.o4 jngtead. 

In summary, we suggest the equivalence between the 
topological information of a network and its complement 
should be considered generally in the definition and de- 
tection of network structures. As an important applica- 



tion we have focused on the community partition prob- 
lem and proposed a symmetric quality function. The 
resulted community detecting scheme has a high resolu- 
tion and can be used to identify hierarchical community 
structures. In addition, we have found that the effects 
of fluctuations on the community structure are weak and 
decrease as the size of network increases. This implies 
that the community structure is unlikely a result of fluc- 
tuations when the size of the network is large enough. 
The question of whether there are other relevant sym- 
metries and how they may provide insights into network 
structures is interesting and deserves further efforts. 

This work is supported by Defense Science and Tech- 
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